Supplementary Materials: Accelerated Stochastic Gradient Descent for Minimizing Finite Sums
نویسنده
چکیده
1 Proof of the Proposition 1 We now prove the Proposition 1 that gives the condition of compactness of sublevel set. Proof. Let B(r) and S(r) denote the ball and sphere of radius r, centered at the origin. By affine transformation, we can assume that X∗ contains the origin O, X∗ ⊂ B(1), and X∗ ∩ S(1) = φ. Then, we have that for ∀x ∈ S(1), (∇f(x), x) ≥ f(x)− f(O) > 0, where we use convexity for the first inequality and O ∈ X∗ ∧ x / ∈ X∗ for the second inequality. We denote the minimum value of (∇f(x), x) on S(1) by α. Since (∇f(x), x) is positive continuous, we have α > 0. For ∀r ≥ 1 and ∀x ∈ S(r), we set x̂ = x/r ∈ S(1), then it follows that f(x) ≥ f(x̂) + (∇f(x̂), x− x̂) ≥ f(x̂) + (r − 1)(∇f(x̂), x̂) ≥ f∗ + (r − 1)α This inequality implies that if r > 1 + c−f∗ α , then we have f(x) > c for ∀x ∈ S(r). Therefore, sublevel set {x ∈ R; f(x) ≤ c} is a closed bounded set. 2 Proof of the Lemma 1 To prove Lemma 1, the following lemma is required, which is also shown in [1]. Lemma A. Let {ξi}i=1 be a set of vectors in R and μ denote an average of {ξi}i=1. Let I denote a uniform random variable representing a size b subset of {1, 2, . . . , n}. Then, it follows that, EI ∥ ∥ ∥ ∥ ∥ 1 b ∑
منابع مشابه
Accelerated Stochastic Gradient Descent for Minimizing Finite Sums
We propose an optimization method for minimizing the finite sums of smooth convex functions. Our method incorporates an accelerated gradient descent (AGD) and a stochastic variance reduction gradient (SVRG) in a mini-batch setting. Unlike SVRG, our method can be directly applied to non-strongly and strongly convex problems. We show that our method achieves a lower overall complexity than the re...
متن کاملStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. However, in the context of empirical risk minimization, it is often helpful to augment the training set by considering random perturbations of input examples. In this case, the objective is no longer a finite sum, and the main candidate for optimization is the stochas...
متن کاملA Simple Practical Accelerated Method for Finite Sums
We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method. Our method achieves an accelerated convergence rate on strongly convex smooth problems. Our method has only one parameter (a step size), and is radically simpler than other accelerated methods for finite sums. Additionally it can be applied when...
متن کاملStochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
In this work we consider the stochastic minimization of nonsmooth convex loss functions, a central problem in machine learning. We propose a novel algorithm called Accelerated Nonsmooth Stochastic Gradient Descent (ANSGD), which exploits the structure of common nonsmooth loss functions to achieve optimal convergence rates for a class of problems including SVMs. It is the first stochastic algori...
متن کاملConditional Accelerated Lazy Stochastic Gradient Descent
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016